Minimal Residual Approaches for Policy Evaluation in Large Sparse Markov Chains

نویسندگان

  • Hengshuai Yao
  • Zhi-Qiang Liu
چکیده

We consider the problem of policy evaluation in a special class of Markov Decision Processes (MDPs) where the underlying Markov chains are large and sparse. We start from a stationary model equation that the limit of Temporal Difference (TD) learning satisfies, and develop a Robbins-Monro method consistently estimating its coefficients. Then we introduce the minimal residual approaches, which solve an approximate version of the stationary model equation. Incremental Least-squares temporal difference (iLSTD) is shown to be a special form of minimal residual approaches. We also develop a new algorithm called minimal residual (MR) algorithm whose step-size can be computed on line. We introduce the Compressed Sparse Row (CSR) format and reduce the complexity of MR to near that of TD. The advantages of the MR algorithm are that it has comparable data efficiency and computational efficiency to iLSTD, but does not require manual selection of step-size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Empirical Bayes Estimation in Nonstationary Markov chains

Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical  Bayes estimators  for the transition probability  matrix of a finite nonstationary  Markov chain. The data are assumed to be of  a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...

متن کامل

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Fast multilevel methods for Markov chains

This paper describes multilevel methods for the calculation of the stationary probability vector of large, sparse, irreducible Markov chains. In particular, several recently proposed significant improvements to the multilevel aggregation method of Horton and Leutenegger are described and compared. Furthermore, we propose a very simple improvement of that method using an over-correction mechanis...

متن کامل

Distributed solving of Markov chains for computer network models

In this paper a distributed iterative GMRES algorithm for solving huge and sparse linear systems (that appear in the Markov chain analysis of queueing network models) is considered. It is implemented using the MPI standard on a collection of Linux machines and the emphasis is put upon the size of linear systems being solved and possibility of storing huge and sparse matrices as well as huge vec...

متن کامل

Preconditioned Generalized Minimal Residual Method for Solving Fractional Advection-Diffusion Equation

Introduction Fractional differential equations (FDEs)  have  attracted much attention and have been widely used in the fields of finance, physics, image processing, and biology, etc. It is not always possible to find an analytical solution for such equations. The approximate solution or numerical scheme  may be a good approach, particularly, the schemes in numerical linear algebra for solving ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008